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Abstract Prior work on routing in delay tolerant networks 
(DTNs) has commonly made the assumption that each pair of 
nodes shares the same inter-contact time distribution as every 
other pair The main argument in this paper is that researchers 
should also be looking at heterogeneous inter-contact time dis- 
tributions. We demonstrate the presence of such heterogeneity 
in the often-used Dartmouth Wi-Fi data set. We also show that 
DTN routing can benefit from knowing these distributions. We 
first introduce a new stochastic model focusing on the inter- 
contact time distributions between all pairs of nodes, which 
we validate on real connectivity patterns. We then analyti- 
cally derive the mean delivery time for a bundle of information 
traversing the network for simple single copy routing schemes. 
The purpose is to examine the theoretic impact of heteroge- 
neous inter-contact time distributions. Finally, we show that 
we can exploit this user diversity to improve routing perfor- 
mance. 



1 Introduction 

In the kind of delay tolerant networks (DTNs) [H that we 
consider in this paper, nodes are mobile and have wireless net- 
working capabilities. They are able to communicate with each 
other only when they are within transmission range. The net- 
work suffers from frequent connectivity disruptions, making 
the topology only intermittently and partially connected. This 
means that there is a very low probability that an end-to-end 
path exists between a given pair of nodes at a given time. Such 
DTNs can be consider in ad hoc networking when connectivity 
is very low (e.g. in tactical military communications), in trans- 
portation systems as in the DieselNet project [2 | or in Pocket 
Switched Networks (PSN) 1 3 1 which are formed by devices 
that people carry everyday (cell phones, PDAs, music players). 
In all these contexts, end-to-end paths can exist temporarily, or 
may sometimes never exist, with only partial paths emerging. 
This paper addresses the extreme case, where only temporal 
paths exist. We call such networks temporal DTNs, or t-DTNs. 
When a node in a t-DTN receives a "bundle" of information 
from a neighboring node, it keeps it until it meets another 
node which provides an opportunity to relay the bundle. The 
bundle is transfered from one node to another instantly and 
this transfer is atomic. 

Prior work on routing in t-DTNs has commonly made the 
assumption that each pair of nodes shares the same inter-contact 
time distribution as every other pair. The main argument in 



this paper is that researchers should also be looking at cases 
in which inter-contact time distributions are heterogeneous. 
Chaintreau et al. 13J posit that there might be heterogeneity, 
but we show it and characterize it. We also show how ex- 
ponential distributions can be composed to yield the heavy- 
tailed distributions that Chaintreau et al. observed. As we shall 
see, the heterogeneity that we highlight allows us to usefully 
extend the work of Spyropoulos et al. [4,5|, which analyzes 
numerous routing schemes for t-DTNs, but that uses mobility 
models that yield homogeneous distributions. 

We show, on the well known Dartmouth Wi-Fi data set ||6l, 
that despite the existence of a heavy-tailed distribution when 
inter-contact times are considered in the aggregate, a large por- 
tion of the node pairs present inter-contact time distributions 
that can be well fitted by an exponential distribution. We found 
these distributions to be heterogeneous, with a wide variation 
in exponents. 

We also provide the first formal analysis of the impact 
of heterogeneous exponential inter-contact time distributions 
on simple single-copy routing schemes. We show that routing 
strategies can benefit, in terms of delay, from this heterogene- 
ity, and in particular from knowing these distributions. A node 
can choose among possible relay nodes based upon their ex- 
pectations for meeting other relays or the destination. 

2 Inter-contact time model 

This section presents the model we use to analytically de- 
rive the delay expectations for the routing protocols we study 
later in this paper. 

2.1 Exponential t-DTNs 

We consider a network composed of n nodes. Let's first 
look at the inter-contact time between two individual nodes 
(/, /'): t}: < t}: < tf: < ... are the successive instants at which a 
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contact between / and j occurs. 

At^j^tf;'-tf^ (1) 

is the inter-contact time between the k''^ and + 1 )''' contact 
instants. 

We assume that the Atfj are samples from independent and 
identically distributed random variables that follow an expo- 
nential law with parameter A,y , which we note t,j = exponential(Ay ) . 
The mean inter-contact time between / and j is thus given by 
E[xtj] = l/Xtj. 
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In the overall network, all n nodes are supposed to behave 
independently, so that the n(n — l)/2 pairwise inter-contact 
times Tij are independent exponential processes with differ- 
ent parameters. The T,y family of processes is symmetric and 
V/, T„ = 0. The exponential t-DTN is thus entirely and uniquely 
characterized by the «(n — l)/2 strictly positive real parame- 
ters Xij. 

2.2 Assumptions 

The model focuses on the temporal dynamics of node con- 
nectivity in a DTN. In this way it provides a common frame- 
work to analyse different DTNs. In particular it applies very 
well to social networks for which the position of nodes at 
a given time is not of primary importance. We believe this 
abstraction helps focus on the inherent characteristics of in- 
termittent connectivity in DTNs. 

Characterizing inter-contact time behavior helps abstract 
away from the spatial information that is essential in the anal- 
ysis of mobile ad hoc networks. There is no reference to ge- 
ographic, localisation or any other such spatial information. 
There is also no reference to air interface parameters, quality 
of or contention on the links, etc. Node mobility is not explic- 
itly modelled: only its aggregated impact on the inter-contact 
time is taken into account. 

The model makes a stationarity hypothesis with respect 
to node inter-contact time distributions. In other words, nodes 
behaviors are assumed to change on a slower scale than bundle 
exchanges. We also suppose that nodes have infinite capacity 
in bandwidth and storage. 

Another key hypothesis is that contacts (and thus bundle 
transfers) are assumed to be instantaneous. In the model, pair- 
wise contacts do not overlap: in contrast to the mobile ad hoc 
network cases, no partial routes (involving more than two nodes) 
exist at any given time. 

In this respect, the results with the proposed model are 
upper bounds, but, as we will see, still provide valuable in- 
formation and insight on how to route bundles in DTNs. We 
leave refinements of the model for future work. 

3 Fitting the model 

In the t-DTN model just elaborated, we assume that the 
inter-contact time distribution for each pair of nodes is expo- 
nential. The main reason is that it will allow us to go beyond 
asymptotic results and provide explicit formulas for the bun- 
dle delivery time, and other parameters, of different routing 
protocols. In this section we look at real data to evaluate how 
reasonable this hypothesis might be. 

3.1 Experimental data set 

To validate the hypothesis, we use real data from the Wi- 
Fi access network of Dartmouth College |6|. These data track 
users' sessions in the wireless network by showing the time 
at which a node associates or dissociates from an access point. 
We use the subset of data pre-processed by Song et al. for their 
prior work [TJ on mobility prediction. 



As we describe in prior work fS^, we must select from 
the data, and make some assumptions, in order to constitute 
a useful DTN data set. We take the subset of users who are 
present in the network every day between January 26* 2004 
and March 11* 2004, a class period during which we expect 
nodes' activity to be fairly stationary. This data set contains 
834 users, or nodes. Then, we assume that two nodes are in 
contact if they are present at the same time at the same ac- 
cess point (AP). Finally, we filter these data to remove the 
well known ping-pong effect. Indeed, wireless nodes, even 
non-mobile, can oscillate at a high frequency between two 
APs. To counter this, we filter all the inter-contact times be- 
low 1,800 seconds. Note that defining better filtering methods, 
albeit challenging, would be of interest for the community. 
As this is not the purpose of this work. We choose here the 
threshold that Yoon et al. |9| used for the same purpose. We 
use this new data set for the remainder of this paper. 

The Wi-Fi scenario may be not a perfect fit for interactions 
between nodes in t-DTNs. Indeed, in opposition to always-on 
devices carried by humans, Wi-Fi nodes are typically turned 
off, transported, and then turned on again, thus missing poten- 
tial contacts en route. However, the size, quality, and public 
availability of the data set make it nonetheless one of the best 
resources for this kind of study. Jones et al. ifTOl and Chain- 
treau et al. fS\ recently used these traces in a similar way. 

3.2 Exponential inter-contacts 

Fig. [U shows the distribution of £[t,j], the expected inter- 
contact time for the pair of nodes (/, j). This plot has been 
computed over all the 28,490 source-destination pairs that ex- 
perienced an average inter-contact time lower than one week 
within the two months period that we considered in Dartmouth 
data. We can see that the distributions are heterogeneous with 
expectations varying over three orders of magnitude. The av- 
erage E [t] is 11.6 hours with a variance of 7. 1 hours. 




300000 600000 

E[x] (s) 

Fig. 1 Distribution of E[t]. 

Then, we test for whether the inter-contact process be- 
tween any two nodes can be modelled by an exponential pro- 
cess with a parameter A = \/E[x\. We use the Cramer- Smirnov- 
Von-Mises [llj hypothesis test. For each pair {ij), we com- 
pare the cumulative distribution for the inter-contacts 
observed and the hypothesis function whose cumulative dis- 
tribution is Fij{x) ~\— exp(— A,yx). We also compare with 
that of a power law distribution. Note that we only perform the 



computation for pairs that show a sufficient level of connectiv- 
ity by having a mean inter-contact time lower than one week 
and that have more than 20 contacts. We identify 8,402 pairs 
to be exponentially distributed and 28 with a power law which 
makes respectively, 62.3% and 0.2% of the 13,482 pairs that 
we retain for the test. 

From these observations, it seems clearly more reasonable, 
in this data set, to model pairwise inter-contact time distribu- 
tions as exponential rather than power law since a large num- 
ber of pairs have shown inter-contact times exponentially dis- 
tributed. Despite that few are power law distributed, we con- 
jecture that the rest of pairs might follow distributions that are 
a mix of exponential and power law distributions. As we have 
examined only one data set, albeit an often-used one, we can- 
not draw many conclusions about what will be revealed else- 
where. It is reasonable to expect that other mobility traces in 
campus environments will show similar characteristics. How- 
ever, it is surprising that a memoryless process seems to be at 
work in such a high proportion of node pairs in an environment 
in which one would expect some temporal correlations. We 
hope this will be a spur to study these distributions in other 
data sets. Traces from the Haggle project |3| or the ones of the 
Reality Mining project |J2J might be considered. We let this 
study for future work. 

3.3 Power laws 

Chaintreau et al. |[3l observed that aggregated inter-contact 
times follow power laws in a number of DTN traces (also 
including one based on the Dartmouth data). Fig. |2] shows 
that, for our data set, the cumulative distribution of aggre- 
gated inter-contact times also follows a power law of the form 
f{x) — cx^, with exponent 5 — —0.16 and scale parameter 
c = 3.45. 
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Fig. 2 Distribution of inter-contacts in logarithmic scale. 

Let's now consider what happens for pure exponential t- 
DTNs where all pairwise inter-contact time distributions are 
exponentially distributed. Under which conditions do the ag- 
gregated inter-contact time distributions follow a power law, 
or is the pairwise exponential assumption too strong to yield a 
power law in the aggregate? 

Let be the aggregated inter-contact time for all pairs of 
nodes, and let be the probability distribution of the A 
parameters: 

P(0>t) = I e-^'p{l)dX (2) 

JX=Q 
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What eqn. |2] says is that, for exponential t-DTNs, the ag- 
gregated inter-contact time distribution is fully characterized 
by the distributions of the A parameters, and thus of the E [T,y] 
matrix. More precisely, the tail cumulative distribution of the 
aggregated inter-contact times is given by the Laplace trans- 
form of the distribution p of the A parameters. 

A Pareto law of the form (^)", t >Q, with shape param- 
eter a > and scale parameter b>0, '\s observed if and only if 
the A follow a gamma distribution p{X) = ^^^^T^J — ' - ^• 

To verify this on the data set we proceed in the following 
way: we estimate parameters a and b from the cumulative dis- 
tribution of the A parameters for pairs that were shown to fol- 
low an exponential behavior (the ones that pass the Cramer hy- 
pothesis test). We find b = 113,766.9 and a = 2.26. Fig. |3(a)| 
shows the estimated cumulative gamma distribution g{x) with 
the experimental lambda cumulative distribution for all pairs 
that have shown to be exponential. Then, we plot in Fig. |3(b)| 
the corresponding power-law /!(f) with cumulative distribution 
of aggregated inter-contact times. As one can see, the two 
experimental curves fit the theoretical curves. 
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Fig. 3 Distributions with exponential pairs. 

What this result shows is that when one considers an expo- 
nential t-DTN, we can regain the power law behavior for the 
aggregated inter-contacts when the distribution of the param- 
eters is a gamma, which is the case in the data we used when 
considering the subset of pairs that have inter-contact times 
exponentially distributed. 



4 Single copy routing strategies 

Having defined a stochastic model that is reaUstic for the 
data set under study, we now examine different simple single 
copy routing strategies. We derive analytical formulas that we 
will use to study the impact of heterogeneous Xij parameters 
on routing. 

In all routing strategies, we consider that nodes know all 
pairwise mean inter-contact times for all nodes in the network, 
i.e., each node knows the A/y matrix. This knowledge could be 
diffused through an epidemic type of routing, or learned by 
each node from past contacts. 
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4.1 Wait scheme 

Under the Wait routing strategy, the source node s waits 
until it meets d, the destination, to deUver the bundle in one 
hop. 

If the bundle is injected at time f , its delivery time is equal 
to R'^^, the remaining inter-contact time before the next contact 
between nodes s and d. The memoryless nature of exponen- 
tials implies that also follows an exponential distribution 
with the same parameter. The mean expected delivery is thus 
given by: 

= l/A,,/ (3) 

This straightforward result gives an upper bound on the 
delivery time that a routing strategy should meet, since the 
Wait strategy is the most rudimentary one hop single copy 
scheme. 

4.2 MED 

The Minimum Expected Delay (Med) routing strategy was 
first introduced by Jain et al. ifTSl . This strategy, similar to 
source routing, defines which path the bundle will follow from 
s to d, that is, the ordered list of intermediate relay nodes it will 
have to go through. The list is chosen to provide minimum 
expected end-to-end delay. 

If a path is given by the following ordered list of nodes 
ro — s < r\ < r2 < < ... < r„_i < r„ — d, and relaying 
occurs at time instants ti <t2 < ■■■ < t,,, the total delivery time 
along path (s,ri ,r2, ...,rn^\,d) is given by the remaining inter- 
contact time after each relaying instant, that is: 

^med 



D 



.,r„_i 



R^ 



(4) 



Using the fact that £■[/?, ., .] = the expected delivery 

time along the path is thus given by: 



E\D 



^med 

i,;-l/2,--/„-i 



,,] = 1/A,,, + 1/A,,,, + ...1/A,„_„/ (5) 



Finding the optimal path thus amounts to finding a lowest- 
weight path between nodes s and c/ in a graph in which the 
weight on each link {ij) is defined as l/A,^. Dijkstra's algo- 
rithm can be used. 

4.3 Spray and Wait routing 

The Spray and Wait strategy was first introduced by Gross- 
glauser and Tse lfT4l . and is designed to take advantage of 
opportunistic contacts. It consists of two steps. First the source 
node uses the first nodes encountered as relays to the destina- 
tion. This is the "spraying" step. A relay node then uses the 
"wait" strategy to relay the bundle, i.e. it waits until it meets 
the destination to deliver the bundle. Here, we study the case 
where only one relay is used, which we designate 1-SW. 

Let us first consider the spraying step. The bundle is in- 
jected at source s at time instant t. The first node r it encounters 
may be any of the « — 1 other nodes d, ri , ^2, r„_2 ™d the 
time X it takes to meet this first node is the infinum of the 
inter-contact times with all other nodes: 



X 



infiKd.Kr,. 



(6) 



Since all 7?',. are independent exponentials with parameters 
Aj,-^., we have (see [15 , p. 328]): 

- The random index r of the first node encountered is inde- 
pendent of the first encounter time X 

- X is exponentially distributed, with parameter: 

= Kd + HiZl i^sri ) 

- PriFirst node encountered is r) ~ 

This means that we can represent the spraying step as inde- 
pendently identifying the encountered node (with probability 
and adding an exponential waiting time with parameter 

As. 

Two cases may arise: either the first node encountered r 
equals d, and s delivers the bundle, or: r ^ d and node r waits 
to meet node d to deliver the bundle. 

The delivery time Z^/, when node d is encountered first is 
thus given by: 



E\ZA = 



1 

As 



(7) 



The delivery time Z,- along path r, i.e., conditioned on 
using node r as a relay, is thus the sum of the first encounter 
time X and the remaining delivery time between nodes r and 
d, and thus: 

E[Zr] = ^ + ^ (8) 

Aj A,rd 

The total delivery time Z is computed by conditioning on 
all possible first encountered nodes (i,ri,r2,...,r„_2; events 
whose probabilities are given by 



n-2 



£[Z] = ^£[Z,] + ^(^£[Z,,]) 



(9) 



After simplification, we can state that: in a network com- 
posed of n nodes, 1-SW delivers a bundle from source s to 
destination d with mean delivery time given by: 



E[D\d' 



(1 



- llr^s,r^d ) 



(10) 



5 Comparing routing protocols 

This section looks at routing performance of the protocols 
we considered in the presence of heterogeneity in inter-contact 
time distributions. 

In this context, we present 1-SW*, a variation of 1-SW. In- 
stead of spraying its bundle to the first node that it encounters, 
the source node s sprays only to nodes in a subset R. We call 
this a 1-SW* scheme. We define 1-SW* to be a 1-SW* scheme 
which uses a subset R that minimizes E[D\/"-' ]. Following 
the same line of reasoning as in Sec. 14.31 and defining 1 / Xm = 
0, one finds that the expected delivery time is given by: 



(11) 



We performed simulations using Dartmouth traces (see Sec.[3]l 
to study how the algorithms behave in the case of heteroge- 
neous connectivity. We simulate the following protocols: Wait 
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and 1-SW which are naive schemes, and, 1-SW* and Med that 
are designed to take advantage of heterogeneity. We slightly 
modified 1-SW, to better compare it with 1-SW*: a node / is a 
potential relay only if > 0, i.e., if it has a chance of meeting 
the destination. In Med, we authorized intermediary relays to 
directly transfer bundles to the destination whenever met. 

We choose at random 100 different source destination pairs 
{s,d) and replay the contacts between the 835 nodes present 
in the data to see how for each pair a bundle, generated at the 
beginning of the two months period, is delivered. 

X values used for route selection in 1-SW* and Med, and 
to determine theoretical delays of 1-SW, 1-SW* and Med have 
been computed over the data filtered due to the ping-pong ef- 
fect (see Sec Is). However, the contacts replayed in simulations 
were that in the original traces as it does not impact the results 
and because filtering was only of interest for modelling. 





delivery 


A delay 


M delay 


th. delay 


hop count 




ratio (%) 


(days) 


(days) 


(days) 


(hops) 


Wait 


11.2 ±0.9 


19.8 ±3.7 


16.3 ±10.3 


41.3 ±0.5 


1.0 ±0.0 


1-SW 


87.3 ±3.0 


23.0 ±o9 


22.7 ±2.7 


15.3 ±0.9 


2.0 ±0.1 


1-SW* 


86.9 ±2.0 


18.8 ±0.4 


15.4 ±0.9 


13.0 ±1.1 


2.0 ±oi 


MED 


87.9 ±2.2 


20.9 ±o7 


18.0 ±1.1 


1.3 ±0.2 


7.2 ±0.2 



Table 1 Simulation results with Dartmouth data. 



Table [T] presents the simulation results averaged over 5 
runs with the 90% confidence levels that are obtained using the 
Student t distribution. It presents, for each of the protocols, the 
average delivery ratio, the average delay ("A delay") and the 
median delay ("M delay") computed over the delivered bun- 
dles, the average theoretical delay over all the bundles gener- 
ated (infinite delay is assumed to be the length of the simulated 
period, i.e. 45 days), and the average hop count, also obtained 
on delivered bundles. 

The major result in Table [T] is that schemes that make use 
of heterogeneity of inter-contact times (1-SW* and Med) per- 
form better, either in delivery ratio or delay, than the ones 
that do not exploit it (Wait and 1-SW). Wait only delivers 
1 1 .2% of bundles because most of the source, destination pairs 
selected at random satisfy Aj^/ = (e.g. they never met). 1- 
SW, 1-SW* and Med achieve almost the same delivery ratios 
with respectively 87.3%, 86.9% and 87.9%. About 13% of 
the bundles were thus not delivered. In terms of delay, among 
these last three protocols, 1-SW plots the highest with a mean 
of 23.0 and a median of 22.7 in days, 1-SW* the lowest with a 
mean of 18.8 and a median of 15.4. Med appears to be in the 
mid-range with a mean of 20.9 and a median of 18.0 in days. 

The difference between the modified 1-SW and 1-SW* 
gives a further insight on the type of heterogeneity that should 
be considered. The modified 1-SW is a one hop strategy that 
uses only true relays to the destination: relay nodes in 1-SW 
must meet both the source and the destination. The scheme is 
not completely ignorant of heterogeneity, as it exploits binary 
connectivity information, the fact that not all nodes meet one 
another. 1-SW* goes beyond that and differentiates between 



neighboring nodes based on the quantitative expected inter- 
contact time. The fact that 1-SW* ourperforms the modified 
1-SW thus indicates that routing actually benefits from the 
quantitative inter-contact time heterogeneity, and not just from 
node connectivity. 

Table [T] shows a descrepancy between the theoretical and 
the experimental delays. This can be first explained by the 
presence of node pairs that do not have an exponential be- 
havior. This is particularly true for 1-SW*. In this case the 
computation of expected delays on mean inter-contact times 
misses possible inter-dependencies of node contacts. 

Simulation artifacts also come into play. The routing sim- 
ulation is carried out on a limited time scale. The A values 
are computed over the entire data set in a prior pass, so a 
relay node may meet the destination for the last time before 
having met the source for the first time. This pre-computation 
being not realistic, we could have used on-line predictive or 
learning methods. However, as they are challenging to define, 
we let this study for future work and intend here to provide 
early validation results to motivate research in the domain. 
The fact that 1-SW* delivers slightly less bundles than 1-SW 
is clearly due to this artefact. Indeed, because in 1-SW, the 
source transfers more rapidly the bundle to a relay, we have a 
lower probability to contacts between that relay and the desti- 
nation. Also, Med suffers from the same simulation artefact, 
it would have delivered 100% of bundles otherwise. 

Through these simulations, we validate the natural feeling 
that we should take into account the heterogeneity of inter- 
contact time distributions in the design of routing solutions for 
t-DTNs. Furthermore, because 1-SW*, which is only a two- 
hop protocol, achieves better performance than Med by de- 
livering bundles with lower delays and a lower impact on net- 
work resources (Med delivers bundles in 7.2 hops in average 
while 1-SW* uses only 2), we expect promising future work 
inspired from 1-SW* to be done. The opportunistic nature of 
1-SW* is the main reason of this superiority over Med, in 
which bundles follow a strict sequence of relays, in a network 
which is not a perfect exponential t-DTN. 

6 Conclusion 

We have first shown that, in a widely-used t-DTN data set, 
distributions of inter-contact times are heterogeneous. As a 
consequence, one has to take it into account while modeling. 
Second, we have validated the insight that considering het- 
erogeneity in routing improves performance. We presented a 
simple routing strategy, 1-SW*, adapted from the Spray and 
Wait scheme, which is capable of using only a subset of relays 
to improve routing performance, measured in term of average 
delay. 

Clearly, our work, based as it is upon one data set, will 
benefit from validation against others as mentioned in Sec. [3] 
Work also needs to be done to examine why a memoryless 
model fits so many node pairs in an environment in which 
one would expect to find more temporal correlations. Finally, 
formal studies and validations should be conducted with more 
elaborate schemes, in terms of number of copies distributed or 
in terms of the number of hops traversed. 
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